{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 模型在线使用\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在[03-split-to-sub-graph.ipynb](./03-split-to-sub-graph.ipynb)中保存的模型是TensorFlow标准的[SavedModel](https://www.tensorflow.org/guide/saved_model)格式，下面将在前文demo的基础上，继续介绍如何将离线的模型部署到线上提供服务。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. 编译TensorFlow以支持XLA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用pip install的TensorFlow默认没有打开XLA，要使用XLA的功能需要自行通过源码编译TensorFlow并安装。编译命令如下：\n",
    "\n",
    "    bazel build --config=opt --define=with_xla_support=true //tensorflow/tools/pip_package:build_pip_package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. 编译模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里使用的模型是在[03-split-to-sub-graph.ipynb](./03-split-to-sub-graph.ipynb)中保存的模型，如下"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "MODEL_DIR = '/tmp/wide-deep-test/model'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/tmp/wide-deep-test/model/saved_model\r\n",
      "├── assets\r\n",
      "├── saved_model.pb\r\n",
      "└── variables\r\n",
      "    ├── variables.data-00000-of-00001\r\n",
      "    └── variables.index\r\n",
      "\r\n",
      "2 directories, 3 files\r\n"
     ]
    }
   ],
   "source": [
    "!tree $MODEL_DIR/saved_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "安装TensorFlow包时会默认安装`saved_model_cli`到anaconda的bin目录里面，请将anaconda的bin目录放到你的`PATH`环境变量里面以便可以找到这个命令。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "TENSORNET_SOURCE_CODE_DIR='/da2/zhangyansheng/tensornet' # 请在此处更改tensornet的源码位置\n",
    "GRAPH_HEADER_OUTPUT_DIR=TENSORNET_SOURCE_CODE_DIR + '/' + 'examples/online_serving' "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-09-08 10:11:23.691866: I tensorflow/core/grappler/devices.cc:60] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA support)\n",
      "2020-09-08 10:11:23.692053: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session\n",
      "2020-09-08 10:11:23.702353: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2593325000 Hz\n",
      "2020-09-08 10:11:23.704053: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f38338cfa30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n",
      "2020-09-08 10:11:23.704083: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version\n",
      "2020-09-08 10:11:23.783678: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] model_pruner failed: Invalid argument: Invalid input graph.\n",
      "2020-09-08 10:11:23.788314: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:797] Optimization results for grappler item: graph_to_optimize\n",
      "2020-09-08 10:11:23.788345: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   model_pruner: Graph size after: 32 nodes (0), 46 edges (0), time = 0.481ms.\n",
      "2020-09-08 10:11:23.788360: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   implementation_selector: Graph size after: 32 nodes (0), 46 edges (0), time = 0.332ms.\n",
      "2020-09-08 10:11:23.788373: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   function_optimizer: Graph size after: 169 nodes (137), 304 edges (258), time = 15.093ms.\n",
      "2020-09-08 10:11:23.788385: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   constant_folding: Graph size after: 83 nodes (-86), 131 edges (-173), time = 33.939ms.\n",
      "2020-09-08 10:11:23.788394: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   shape_optimizer: shape_optimizer did nothing. time = 0.24ms.\n",
      "2020-09-08 10:11:23.788404: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   arithmetic_optimizer: Graph size after: 82 nodes (-1), 132 edges (1), time = 2.315ms.\n",
      "2020-09-08 10:11:23.788413: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   layout: layout did nothing. time = 0.092ms.\n",
      "2020-09-08 10:11:23.788423: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   remapper: Graph size after: 82 nodes (0), 132 edges (0), time = 0.667ms.\n",
      "2020-09-08 10:11:23.788432: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   loop_optimizer: Graph size after: 82 nodes (0), 132 edges (0), time = 0.656ms.\n",
      "2020-09-08 10:11:23.788442: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   dependency_optimizer: Graph size after: 51 nodes (-31), 65 edges (-67), time = 1.282ms.\n",
      "2020-09-08 10:11:23.788451: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   memory_optimizer: Graph size after: 51 nodes (0), 65 edges (0), time = 2.872ms.\n",
      "2020-09-08 10:11:23.788460: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   model_pruner: Invalid argument: Invalid input graph.\n",
      "2020-09-08 10:11:23.788470: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   implementation_selector: Graph size after: 51 nodes (0), 65 edges (0), time = 0.091ms.\n",
      "2020-09-08 10:11:23.788479: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   function_optimizer: function_optimizer did nothing. time = 0.044ms.\n",
      "2020-09-08 10:11:23.788489: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   constant_folding: Graph size after: 51 nodes (0), 65 edges (0), time = 1.381ms.\n",
      "2020-09-08 10:11:23.788498: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   shape_optimizer: shape_optimizer did nothing. time = 0.058ms.\n",
      "2020-09-08 10:11:23.788508: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   arithmetic_optimizer: Graph size after: 51 nodes (0), 65 edges (0), time = 1.301ms.\n",
      "2020-09-08 10:11:23.788517: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   remapper: Graph size after: 51 nodes (0), 65 edges (0), time = 0.446ms.\n",
      "2020-09-08 10:11:23.788527: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:799]   dependency_optimizer: Graph size after: 51 nodes (0), 65 edges (0), time = 0.66ms.\n",
      "INFO:tensorflow:Restoring parameters from /tmp/wide-deep-test/model/saved_model/variables/variables\n",
      "WARNING:tensorflow:From /da2/zhangyansheng/tf_package/anaconda3/lib/python3.7/site-packages/tensorflow/python/tools/saved_model_aot_compile.py:332: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use `tf.compat.v1.graph_util.convert_variables_to_constants`\n",
      "WARNING:tensorflow:From /da2/zhangyansheng/tf_package/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/graph_util_impl.py:359: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Use `tf.compat.v1.graph_util.extract_sub_graph`\n",
      "INFO:tensorflow:Froze 8 variables.\n",
      "INFO:tensorflow:Converted 8 variables to const ops.\n",
      "INFO:tensorflow:Writing graph def to: /tmp/saved_model_clipsgpbteg/frozen_graph.pb\n",
      "INFO:tensorflow:Writing config_pbtxt to: /tmp/saved_model_clipsgpbteg/config.pbtxt\n",
      "INFO:tensorflow:Generating XLA AOT artifacts in: /da2/zhangyansheng/tensornet/examples/online_serving\n"
     ]
    }
   ],
   "source": [
    "!saved_model_cli aot_compile_cpu \\\n",
    "             --dir $MODEL_DIR/saved_model  \\\n",
    "             --tag_set serve \\\n",
    "             --output_prefix $GRAPH_HEADER_OUTPUT_DIR/graph \\\n",
    "             --cpp_class Graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/da2/zhangyansheng/tensornet/examples/online_serving\r\n",
      "├── BUILD\r\n",
      "├── graph.cc\r\n",
      "├── graph.h\r\n",
      "├── graph_makefile.inc\r\n",
      "├── graph_metadata.o\r\n",
      "├── graph.o\r\n",
      "├── main.cc\r\n",
      "├── random.h\r\n",
      "└── test_env\r\n",
      "    └── data\r\n",
      "        ├── feature.data\r\n",
      "        └── slot.data\r\n",
      "\r\n",
      "2 directories, 10 files\r\n"
     ]
    }
   ],
   "source": [
    "!tree $GRAPH_HEADER_OUTPUT_DIR"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. 调用模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "编写graph.cc调用模型，在源码`examples/online_serving`目录下我们已经编写好了一个例子，可以参考使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r\n",
      "#include \"examples/online_serving/graph.h\"\r\n",
      "\r\n",
      "#include <vector>\r\n",
      "\r\n",
      "#define EIGEN_USE_THREADS\r\n",
      "#define EIGEN_USE_CUSTOM_THREAD_POOL\r\n",
      "\r\n",
      "#include \"third_party/eigen3/unsupported/Eigen/CXX11/Tensor\"\r\n",
      "\r\n",
      "extern \"C\" int Run(const std::vector<std::vector<std::vector<float>>>& input,\r\n",
      "                   std::vector<float>& output) {\r\n",
      "    Eigen::ThreadPool tp(std::thread::hardware_concurrency());\r\n",
      "    Eigen::ThreadPoolDevice device(&tp, tp.NumThreads());\r\n",
      "    Graph graph;\r\n",
      "    graph.set_thread_pool(&device);\r\n",
      "\r\n",
      "    std::vector<int> dim = {1, 8};\r\n",
      "    for (size_t i = 0; i < input.size(); ++i) {\r\n",
      "        if (input[i].size() != Graph::kNumArgs / 2) {\r\n",
      "            std::cerr << \"TFFeaValues size is wrong, expected \" << Graph::kNumArgs / 2 << \" but get\" << input[i].size() << std::endl;\r\n",
      "            return -1;\r\n",
      "        }\r\n",
      "        if ((int)input[i][0].size() != dim[0] + dim[1]) {\r\n",
      "            std::cerr << \"embedding size is wrong, expected \" << dim[0] + dim[1] << \" but get\" << input[i][0].size() << std::endl;\r\n",
      "            return -1;\r\n",
      "        }\r\n",
      "\r\n",
      "        std::copy(input[i][0].data(), input[i][0].data() + dim[0], graph.arg_feed_wide_emb_0_data() + i * dim[0]);\r\n",
      "        std::copy(input[i][0].data() + dim[0], input[i][0].data() + dim[0] + dim[1], graph.arg_feed_deep_emb_0_data() + i * dim[1]);\r\n",
      "\r\n",
      "        std::copy(input[i][1].data(), input[i][1].data() + dim[0], graph.arg_feed_wide_emb_1_data() + i * dim[0]);\r\n",
      "        std::copy(input[i][1].data() + dim[0], input[i][1].data() + dim[0] + dim[1], graph.arg_feed_deep_emb_1_data() + i * dim[1]);\r\n",
      "\r\n",
      "        std::copy(input[i][2].data(), input[i][2].data() + dim[0], graph.arg_feed_wide_emb_2_data() + i * dim[0]);\r\n",
      "        std::copy(input[i][2].data() + dim[0], input[i][2].data() + dim[0] + dim[1], graph.arg_feed_deep_emb_2_data() + i * dim[1]);\r\n",
      "\r\n",
      "        std::copy(input[i][3].data(), input[i][3].data() + dim[0], graph.arg_feed_wide_emb_3_data() + i * dim[0]);\r\n",
      "        std::copy(input[i][3].data() + dim[0], input[i][3].data() + dim[0] + dim[1], graph.arg_feed_deep_emb_3_data() + i * dim[1]);\r\n",
      "\r\n",
      "    }\r\n",
      "\r\n",
      "    auto ok = graph.Run();\r\n",
      "    if (!ok) {\r\n",
      "        return -1;\r\n",
      "    }\r\n",
      "\r\n",
      "    output.clear();\r\n",
      "    output.assign(input.size(), 0.0);\r\n",
      "    std::copy(graph.result0_data(), graph.result0_data() + input.size(), output.data());\r\n",
      "\r\n",
      "    return 0;\r\n",
      "}\r\n",
      "\r\n"
     ]
    }
   ],
   "source": [
    "!cat $GRAPH_HEADER_OUTPUT_DIR/graph.cc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. 编译最终使用的动态库libmodel.so"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 执行下面命令编译libmodel.so\n",
    "\n",
    "```bash\n",
    "cd $TENSORNET_SOURCE_CODE_DIR && bazel build -c opt //examples/online_serving:libmodel.so\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. 编译tf_serving，调用libmodel.so进行预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 编译tf_serving\n",
    "    \n",
    "```bash\n",
    "cd $TENSORNET_SOURCE_CODE_DIR && bazel build -c opt //examples/online_serving:tf_serving\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "预测可以直接通过dlopen调用`libmodel.so`即可， `Run` 函数对应graph.cc中定义的 `Run` 函数。具体的调用方式可以参考 `$TENSORNET_SOURCE_CODE_DIR/examples/online_serving/main.cc` 。\n",
    "\n",
    "  **注意：**下面例子没有实现embedding lookup的功能，使用随机数代替真实的embedding。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "// Copyright (c) 2020, Qihoo, Inc.  All rights reserved.\r\n",
      "//\r\n",
      "// Licensed under the Apache License, Version 2.0 (the \"License\");\r\n",
      "// you may not use this file except in compliance with the License.\r\n",
      "// You may obtain a copy of the License at\r\n",
      "//\r\n",
      "//       http://www.apache.org/licenses/LICENSE-2.0\r\n",
      "//\r\n",
      "// Unless required by applicable law or agreed to in writing, software\r\n",
      "// distributed under the License is distributed on an \"AS IS\" BASIS,\r\n",
      "// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\r\n",
      "// See the License for the specific language governing permissions and\r\n",
      "// limitations under the License.\r\n",
      "\r\n",
      "#include <iostream>\r\n",
      "#include <chrono>\r\n",
      "#include <fstream>\r\n",
      "#include <string>\r\n",
      "#include <vector>\r\n",
      "#include <map>\r\n",
      "#include <boost/algorithm/string.hpp>\r\n",
      "#include <dlfcn.h>\r\n",
      "\r\n",
      "#include \"random.h\"\r\n",
      "\r\n",
      "using namespace std::chrono;\r\n",
      "\r\n",
      "typedef int (*RUN_FUNC)(const std::vector<std::vector<std::vector<float>>>&,\r\n",
      "                        std::vector<float>&);\r\n",
      "\r\n",
      "const int k_batch_size = 32;\r\n",
      "\r\n",
      "void InitWeight(int dim, std::vector<float>& weight) {\r\n",
      "    weight.clear();\r\n",
      "    auto& reng = tensornet::local_random_engine();\r\n",
      "    auto distribution = std::normal_distribution<float>(0, 1 / sqrt(dim));\r\n",
      "\r\n",
      "    for (int i = 0; i < dim; ++i) {\r\n",
      "        weight.push_back(distribution(reng) * 0.001);\r\n",
      "    }\r\n",
      "}\r\n",
      "\r\n",
      "int combine_fea(std::vector<std::vector<float>> emb_feas, std::vector<float>& merged_feas) {\r\n",
      "    if (emb_feas.size() % 2 != 0) {\r\n",
      "        std::cerr << \"combine_fea error.\" << std::endl;\r\n",
      "        return -1;\r\n",
      "    }\r\n",
      "    float wide_lr = 0.0;\r\n",
      "    std::vector<float> dnn_vec(8, 0.0);\r\n",
      "    for (size_t i = 0; i < emb_feas.size(); i++) {\r\n",
      "        if (i % 2 == 0) {\r\n",
      "            wide_lr += emb_feas[i][0];\r\n",
      "        } else if (i % 2 == 1) {\r\n",
      "            for (size_t j = 0; j < emb_feas[i].size(); ++j) {\r\n",
      "                dnn_vec[j] += emb_feas[i][j];\r\n",
      "            }\r\n",
      "        }\r\n",
      "    }\r\n",
      "\r\n",
      "    int count = emb_feas.size() / 2;\r\n",
      "    merged_feas.push_back(wide_lr / count);\r\n",
      "    for (auto& w : dnn_vec) {\r\n",
      "        merged_feas.push_back(w / count);\r\n",
      "    }\r\n",
      "\r\n",
      "    return 0;\r\n",
      "}\r\n",
      "\r\n",
      "int emb_lookup(const std::vector<std::vector<std::vector<uint64_t> > >& inputs,\r\n",
      "               std::vector<std::vector<std::vector<float>>>& emb_inputs) {\r\n",
      "    for (size_t b = 0; b < inputs.size(); ++b) {\r\n",
      "        std::vector<std::vector<float>> emb_slots;\r\n",
      "        for (size_t s = 0; s < inputs[b].size(); ++s) {\r\n",
      "            int input_slot = s;\r\n",
      "            std::vector<std::vector<float>> emb_feas;\r\n",
      "            for (size_t f = 0; f < inputs[b][input_slot].size(); ++f) {\r\n",
      "                std::vector<float> weight;\r\n",
      "                // TODO : seek embedding\r\n",
      "                InitWeight(1, weight);\r\n",
      "                emb_feas.emplace_back(std::move(weight));\r\n",
      "                InitWeight(8, weight);\r\n",
      "                emb_feas.emplace_back(std::move(weight));\r\n",
      "            }\r\n",
      "            std::vector<float> merged_feas;\r\n",
      "            combine_fea(emb_feas, merged_feas);\r\n",
      "            emb_slots.emplace_back(std::move(merged_feas));\r\n",
      "        }\r\n",
      "        emb_inputs.emplace_back(std::move(emb_slots));\r\n",
      "    }\r\n",
      "\r\n",
      "    return 0;\r\n",
      "}\r\n",
      "\r\n",
      "int main() {\r\n",
      "    std::string train_slot = \"./data/slot.data\";\r\n",
      "    std::ifstream slot_if(train_slot);\r\n",
      "    if (!slot_if.is_open()) {\r\n",
      "        return -1;\r\n",
      "    }\r\n",
      "    std::string input;\r\n",
      "    getline(slot_if, input);\r\n",
      "    std::vector<std::string> slots_vec;\r\n",
      "    boost::split(slots_vec, input, boost::is_any_of(\",\"));\r\n",
      "    std::map<int, int> slot2pos;\r\n",
      "    for (size_t i = 0; i < slots_vec.size(); ++i) {\r\n",
      "        slot2pos[std::stoi(slots_vec[i])] = i;\r\n",
      "    }\r\n",
      "\r\n",
      "    void* handle = dlopen(\"./libmodel.so\", RTLD_LAZY);\r\n",
      "    if (handle == NULL) {\r\n",
      "        std::cerr << \"dlopen error.\" << std::endl;\r\n",
      "        return -1;\r\n",
      "    }\r\n",
      "    RUN_FUNC run_func = reinterpret_cast<RUN_FUNC> (dlsym(handle, \"Run\"));\r\n",
      "    if (run_func == NULL) {\r\n",
      "        std::cerr << \"get Run error.\" << std::endl;\r\n",
      "        return -1;\r\n",
      "    }\r\n",
      "\r\n",
      "\r\n",
      "    std::string file_name = \"./data/feature.data\";\r\n",
      "    std::ifstream data_if(file_name);\r\n",
      "    if (!data_if.is_open()) {\r\n",
      "        std::cerr << \"open input error.\" << std::endl;\r\n",
      "        return -1;\r\n",
      "    }\r\n",
      "\r\n",
      "    std::vector<std::vector<std::vector<uint64_t> > > inputs;\r\n",
      "    std::vector<std::vector<std::vector<float>>> emb_inputs;\r\n",
      "    while(getline(data_if, input)) {\r\n",
      "        std::vector<std::vector<uint64_t> > one_input;\r\n",
      "        one_input.assign(slot2pos.size(), {});\r\n",
      "        std::vector<std::string> vec;\r\n",
      "        boost::split(vec, input, boost::is_any_of(\"\\t\"));\r\n",
      "        for (size_t i = 1; i < vec.size(); ++i) {\r\n",
      "            std::vector<uint64_t> feas;\r\n",
      "            std::vector<std::string> slot_feas;\r\n",
      "            boost::split(slot_feas, vec[i], boost::is_any_of(\"\\001\"));\r\n",
      "            int index;\r\n",
      "            //std::cout << \"slot fea:\" << slot_feas[0] << std::endl;\r\n",
      "            if (slot2pos.count(std::stoi(slot_feas[0])) > 0) {\r\n",
      "                index = slot2pos[std::stoi(slot_feas[0])];\r\n",
      "                std::vector<std::string> feas_vec;\r\n",
      "                boost::split(feas_vec, slot_feas[1], boost::is_any_of(\"\\002\"));\r\n",
      "                for (auto& fea : feas_vec) {\r\n",
      "                    feas.push_back(std::stoull(fea));\r\n",
      "                }\r\n",
      "            } else {\r\n",
      "                continue;\r\n",
      "            }\r\n",
      "            one_input[index] = feas;\r\n",
      "        }\r\n",
      "        inputs.emplace_back(one_input);\r\n",
      "\r\n",
      "        if (inputs.size() % k_batch_size == 0) {\r\n",
      "            emb_lookup(inputs, emb_inputs);\r\n",
      "            std::vector<float> outputs;\r\n",
      "            auto start = system_clock::now();\r\n",
      "            run_func(emb_inputs, outputs);\r\n",
      "            auto end   = system_clock::now();\r\n",
      "            auto duration = duration_cast<microseconds>(end - start);\r\n",
      "\r\n",
      "            for (size_t i = 0; i < outputs.size(); ++i) {\r\n",
      "                std::cout << outputs[i] << std::endl;\r\n",
      "            }\r\n",
      "            inputs.clear();\r\n",
      "            outputs.clear();\r\n",
      "            emb_inputs.clear();\r\n",
      "        }\r\n",
      "    }\r\n",
      "\r\n",
      "    if (inputs.size() != 0) {\r\n",
      "        emb_lookup(inputs, emb_inputs);\r\n",
      "        std::vector<float> outputs;\r\n",
      "        auto start = system_clock::now();\r\n",
      "        run_func(emb_inputs, outputs);\r\n",
      "        auto end   = system_clock::now();\r\n",
      "        auto duration = duration_cast<microseconds>(end - start);\r\n",
      "\r\n",
      "        for (auto w : outputs) {\r\n",
      "            std::cout <<  w << std::endl;\r\n",
      "        }\r\n",
      "    }\r\n",
      "\r\n",
      "    return 0;\r\n",
      "}\r\n"
     ]
    }
   ],
   "source": [
    "!cat $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/main.cc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. 运行"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在`$TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/`目录下我们已经放置了一部分测试数据，我们需要将编译好的`tf_serving`和`libmodel.so`拷贝到`$TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/`目录下以便运行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cp -f $TENSORNET_SOURCE_CODE_DIR/bazel-bin/examples/online_serving/libmodel.so $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/\n",
    "!cp -f $TENSORNET_SOURCE_CODE_DIR/bazel-bin/examples/online_serving/tf_serving $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/da2/zhangyansheng/tensornet/examples/online_serving/test_env/\r\n",
      "├── data\r\n",
      "│   ├── feature.data\r\n",
      "│   └── slot.data\r\n",
      "├── libmodel.so\r\n",
      "└── tf_serving\r\n",
      "\r\n",
      "1 directory, 4 files\r\n"
     ]
    }
   ],
   "source": [
    "!tree $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "  slot.data中slot顺序需要和wide_deep.py中的WIDE_SLOTS和DEEP_SLOTS顺序一致"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 1,2,3,4\r\n"
     ]
    }
   ],
   "source": [
    "!cat $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/data/slot.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "  feature.data数据按照main.cc中解析的格式构造即可，下面有些特殊分隔符显示不对。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\t1\u0001-1956697246319764053\t2\u0001-5730244542641024933\t3\u0001-9118175470622903910\u0002-8448113875518360108\t4\u0001-2457261431940944054\r\n",
      "0\t1\u0001-1956697246319764053\t2\u0001-1160342140770244045\t3\u0001-928246004650238241\u00026746402536336743089\t4\u00018069461642963552018\r\n",
      "0\t1\u0001-1956697246319764053\t2\u0001-7395780378584928338\t3\u0001-8198260618690915435\u00025518680928552928316\t4\u00011517509480232003656\r\n",
      "0\t1\u0001-1956697246319764053\t2\u00015418959032072182947\t3\u0001-7328057248110740505\u00022638487947984888231\t4\u0001-3889211362256458670\r\n",
      "0\t1\u0001-1956697246319764053\t2\u0001-1447814788092430700\t3\u00011314136415692403795\u00025677372407420628171\t4\u00018002349267150817951\r\n",
      "0\t1\u0001-1956697246319764053\t2\u0001-3444864941673928379\t3\u00014367684639007080916\u00028517447599233938262\t4\u00017249151252390487352\r\n",
      "0\t1\u0001-1956697246319764053\t2\u0001-25308817390645703\t3\u0001-7048581416920648308\u0002415903991668524816\t4\u00011517509480232003656\r\n",
      "0\t1\u0001-1956697246319764053\t2\u0001-3444864941673928379\t3\u00015380535544434298816\u00024526355427571578606\t4\u0001-4517525648429849306\r\n",
      "0\t1\u0001-1956697246319764053\t2\u0001-3444864941673928379\t3\u00015380535544434298816\u00025299058237759487470\t4\u0001-4614089404967711381\r\n"
     ]
    }
   ],
   "source": [
    "!cat $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/data/feature.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "  执行`tf_serving`，输出预测结果。由于此demo没有使用真实的embedding，所以预测结果不可信。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.500834\r\n",
      "0.500769\r\n",
      "0.500813\r\n",
      "0.5008\r\n",
      "0.500809\r\n",
      "0.500836\r\n",
      "0.500773\r\n",
      "0.500765\r\n",
      "0.500878\r\n"
     ]
    }
   ],
   "source": [
    "!cd $TENSORNET_SOURCE_CODE_DIR/examples/online_serving/test_env/ && ./tf_serving"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面我们展示了如何使用XLA做在线预估，这个代码只供参考学习，真实在线使用需要再做优化。其中可以看到，在线预估时：\n",
    "\n",
    "1. 我们省去了embedding lookup的sub graph，在线实现可以更加容易的嵌入到业务代码中。\n",
    "2. embedding的数据会单独保存到字典中，在线自行查询。\n",
    "\n",
    "在[05-export-feature-embedding.ipynb](./05-export-feature-embedding.ipynb)一节中我们会说明如何将sparse的embedding数据转换成字典以供在线使用。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
